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Abstract —The server-centric data centre network architecture 
can accommodate a wide variety of network topologies. Newly 
proposed topologies in this arena often require several rounds of 
analysis and experimentation in order that they might achieve 
their full potential as data centre networks. We propose a family 
of novel routing algorithms on two well-known data centre 
networks of this type, (Generalized) DCell and FiConn, nsing 
techniques that can be applied more generally to the class 
of networks we call completely connected recursively-defined 
networks. In doing so, we develop a classification of all possible 
routes from server-node to server-node on these networks, called 
general routes of order t, and find that for certain topologies of 
interest, our routing algorithms efficiently produce paths that are 
up to 16% shorter than the best previously known algorithms, 
and are comparable to shortest paths. In addition to finding 
shorter paths, we show evidence that our algorithms also have 
good load-balancing properties. 

I. Introduction 

The explosive growth of online services powered by data 
centres (web search, cloud computing, etc.) has motivated 
intense research into data centre network (DCN) design over 
the past decade and brought about major breakthroughs. Eor 
example, fat-tree DCNs, introduced in III , use commodity off- 
the-shelf (COTS) servers and switches in a fat-tree (topology), 
and have resulted in an evolutionary shift in production 
data centres towards leaf-spine topologies, built from COTS 
hardware. COTS fat-tree DCNs are not a panacea, however; 
for example, fat-trees are difficult to scale. 

Research on DCN architecture is ongoing and each new 
architecture invites the use of certain classes of topologies. 
Indirect networks, where servers are the terminals connected 
to a switching fabric, are the prevailing example. Eat-trees 
are among the topologies that can be implemented in indirect 
network architectures. A host of alternative topologies can be 
implemented as indirect networks, including random regular 
graphs (Ill) and butterfly networks (lH). Likewise, the optical- 
switch hybrid DCN Helios (||4l) can be seen as an architecture 
with the capacity to accommodate a variety of topologies (both 
in the wired links as well as in the optical switch itself). Each 
architecture sets constraints on the topology in a variety of 
ways; for example, by the separation of switching nodes from 
server nodes or the number of ports in the available hardware. 

The server-centric DCN (SCDCN) architecture, introduced 
in 0, accommodates a great variety of network topologies and 


has resulted in a number of new DCN designs, both derived 
from existing and well-understood topologies in interconnec¬ 
tion networks as well as topologies geared explicitly towards 
DCNs (e.g., 0-IIO1). 

Only dumb crossbar-like switches are used in an SCDCN 
and the servers are responsible for routing packets through the 
network. Therefore, the switches have no knowledge of the 
network topology and are only connected to servers. Servers, 
on the other hand, may be connected to both switches and 
servers. These parameters, which make up part of the SCDCN 
architecture, invite sophisticated topologies from abstractions 
as graphs, along with accompanying analyses. We are con¬ 
cerned primarily with routing algorithms for two well-known 
SCDCNs, DCell (0) and EiConn (0), and the topologies 
called Generalized DCell ( 01 11121 ). 

We characterise (Generalized) DCell and EiConn as a spe¬ 
cial case of completely connected recursively-defined networks 
(CCRDN), which we use to develop a classification (which, 
to our knowledge, is novel) of all possible routes from server- 
node to server-node in the DCNs (Generalized) DCell and 
EiConn. Our main result pertains to a specihc family of 
routing algorithms, called PR (or ProxyRoute), which we 
develop with the primary aim of improving upon the originally 
proposed (and best known) routing algorithms, as regards hop- 
length. This goal is achieved with improvements as high as 
16% for certain topologies and paths that are comparable, in 
length, to shortest paths. In addition, we give empirical evi¬ 
dence that the path diversity provided by PR does a better job 
of balancing load than DCellRouting. Hitherto, the only 
algorithms for balancing communication load in (Generalized) 
DCell and EiConn are the adaptive routing algorithms DFR and 
TAR presented in 061, so PR is also novel in this respect. 

Two of our instances of PR called GP_I and GP_0, exploit 
the topological structure of (Generalized) DCell and FiConn in 
order to hnd short paths efficiently by means of an intelligent 
search (see Section IV-Ab of sub-structures called “proxies”. 
We then empirically compare the results of our intelligent 
versions of PR with a shortest path algorithm, a brute force 
version of PR and the routing algorithms that were originally 
proposed in II5I6I12II . 

We give dehnitions in Sections HIllIIIl where we abstract the 
DCNs (Generalized) DCell and FiConn as graphs which can 








be characterised as CCRDNs. Section HV] describes previously 
known routing algorithms for these DCNs, in the context of 
CCRDNs, and our classification of routes in CCRDNs is given 
in Section IIV-BI as general routes of order t. We present 
our main contribution in Section |V] the design of PR. Our 
empirical work is described and evaluated in Section |VT] and 
future avenues for research are identified in the conclusion. 

II. Server-centric DCNs 

Our results and experiments are concentrated on graph 
theoretical abstractions of certain SCDCNs. Therefore, it is 
appropriate that we define this abstraction precisely. 

An SCDCN consists of switches, which act only as cross¬ 
bars and have no routing intelligence, and servers. These 
components are linked together, with the only restriction being 
that a switch cannot be linked directly to another switch; we 
assume all links are bidirectional. As such, an SCDCN is 
abstracted here by an undirected graph G = (W u S, E), with 
two types of nodes called switch-nodes, W, and server-nodes, 
S. Naturally, each switch of the SCDCN corresponds to a 
switch-node, w e W, and each server corresponds to a server- 
node, X e S. Each link of the SCDCN corresponds to an edge 
e of E, which, for convenience, we shall also call a link. The 
condition that switch-to-switch links are not allowed implies 
that ${u,v) e E such that u,v b W. See Ha for undefined 
graph-theoretic terms. 

Also relevant to our discussion of routing algorithms in 
SCDCNs is the fact that(l) packets are sent and received only 
by servers, and (2) packets endure a negligible amount of 
processing time in each switch, compared to the time spent in 
each server. The reason for (2) is that we assume the packet is 
routed in the server’s operating system, either via a table look¬ 
up or computation. This could be done, e.g., by a dedicated 
virtual machine or a specialised hypervisor with the capability 
to route packets. In any case, we may assume that with today’s 
COTS servers, a packet spends much more time at servers than 
in switches. 

The outcome of (1) is that we need only discuss routing 
algorithms that construct paths whose endpoints are server- 
nodes. That is, a route on G is a path whose endpoints are 
server-nodes. The outcome of (2) is that a hop from server- 
node to server-node is indistinguishable from one that also 
passes through a switch-node. 

III. Recursively-Defined Networks 

Our results are concerned with network topologies of a 
certain form that have arisen frequently in the area of inter¬ 
connection networks, and recently as SCDCNs. 

Definition III.l. A family X = {X{h) : h = 0,1,...} of 
interconnection networks is recursively-defined if X{h), where 
h > 0, is the disjoint union of copies of X(h — 1) with the 
addition of extra links joining nodes in the different copies. We 
call a member of X a recursively-defined network (RDN). A 
family of RDNs X is a completely-connected RDN (CCRDN) 
(see, e.g., if there is at least one link joining every copy 

of X(h — 1) within X(ff) to every other copy. 


A. The DCNs DCell 

The DCNs DCell (||5l) were the first family of SCDCNs to 
be proposed, and their graphs form the family of CCRDNs 
described below. 

Fix some n > 2. The graph DCello,„ consists of one switch- 
node connected to n server-nodes. For k ^ 0, let tk be the 
number of server-nodes in DCell^ „. For k > 0, the graph 
DCellfe n consists of tk-i + 1 disjoint copies of DCellfe_i_„, 
labelled D\_^, for 0 ^ i ^ ffc-i- Each pair of distinct 
DCellfc_i ijS is joined by exactly one link, called a leveTk 
link, whose exact definition is given below, in terms of the 
labels of the server-nodes. 

Fabel a server-node of a DCell„ fe, for some k > 0, by 
X = XkXk-i • • ■ Xq, where Xk-iXk-2 ''' xq is the label of a 
server-node in and 0 sS xq < n and 0 ^ Xi < gk 

for * > 0, where gk = tk-i + 1. The labels of DCell„ ^ are 
mapped bijectively to the set {0,1,..., ffe — 1} by uidk{x) = 
Xktk-i + Xk-itk -2 + • • • + xito -f Xq. Fabel and uid are 
combined in the notation [xk,uidk-i{xk-iXk -2 • ■ ■ xq)]. 

Fet 0 ^ Xk < Uk < tk-i -F 1 be the indices of the 
DCellfc_i ijS labelled and A level-fc link connects 

node Uk — 1 in to node Xk in This is the link 

(y/c It Xktk — l , Xk -f yklk-l')- 

1) Generalized DCell: The definition of the DCNs DCell 
generalises readily; see mini. The key observation is that 
the level-fc links are a perfect matching of the server nodes 
in the disjoint copies of the DCellfe_i „s, where every pair 
of distinct DCellfc_i „s is connected by a link. Many such 
matchings are possible. A given matching pk which satisfies 
the stated properties defines the level-/c links and is called a 
Pk-connection rule {my 

A Generalized DCellk,n inherits the definition of DCellfe_„, 
forfc ^ 0, except that the level-/c links may satisfy an arbitrary 
Pfc-connection rule. Note that we insist that there be only one 
connection rule for each level k, so that a given family of 
Generalized DCells can be specified by a set of connection 
rules {pi,p 2 ,P 3 ,-■ ■}■ 

This is in accordance with Definition 1 in m, with two 
exceptions. We model Generalized DCello,™ as a switch-node 
connected to n server-nodes, rather than modelling it as iT„, 
and we require n > 2. 

In order to demonstrate the impact of different connection 
rules on the routing algorithms presented in Section |IV] it 
suffices to consider just one connection rule besides the one 
for DCell. For this purpose, we use /3-DCell, defined by the 
/3-connection rule given in ifT^ . 

The /3-connection rule is (perhaps not obviously) as follows: 
Fet 0 ^ Xk < Vk < tk-i -f 1 be the indices of the /3- 
DCellfe_i „s labelled and A level-A; link connects 

node i/fe-Xfc-1 in B^f ^ to node tk-i-yk + Xk in Blf^. This 
is the link (j/^ - Xk - 1Xktk-ifk-i - yk + Xk-i- yktk-i)- 

B. The DCNs FiConn 

One of the issues with (Generalized) DCell^ „ is that each 
server-node has degree fc -F 1. This requires that each server 






has k + 1 NIC ports, which is not typically the case for COTS 
servers when A: > 1. 

FiConn, proposed in Q, is a CCRDN that requires at most 
two ports per server; it uses only half of the available server- 
nodes (those of degree one) in each copy of FiConn^-i „ 
when building FiConnfe^„. This, in turn, leaves server-nodes 
of degree one available to build the next level. We describe 
FiConn below. 

Fix some even n > 3. FiConno_„ is the network consisting 
of one switch-node connected to n server-nodes. Let b be 
the number of available server-nodes in FiConnfc_i „ for 
k > 0. Build FiConnfc,n from 6/2 -f 1 copies of FiConn^-i^n, 
labelled Fl_^, for 0 ^ i ^ 6/2. From @ we have that 
6/2 -f 1 = ffc-i/2^ + 1, so that the label of a server- 
node a: of a FiConn^^n is, expressed as the (fc + l)-tuple 
X = XkXk-i • • - xq, where Xk-iXk-2 ■ ■ ■ xq is a server-node 
in Pk-i and we have 0 ^ xq < n, but 0 ^ < gk, where 

gk = 6/2 -f 1 = + 1 (diverging slightly from the 

labels in DCell). We have uidk{x) = Xktk-i + Xk-itk -2 + 

• • • -I- xifo + xo and [xt, uidk-i{xk-iXk -2 ■ ■ ■ xq)] to label 
server-nodes, once more. 

Let 0 ^ Xk < Uk < tk-il‘^^ + 1 be the indices of the 
FiConnfc_i „s and A level-A; link connects server- 

node {uk — 1)2^ -I- 2^“^ + 1 in to server-node Xfe2^ -f 

2'=-! + 1 in Dlt^. This is the link {{yk - 1)2'= -f 2'="! + 1 + 
Xfcffc —1 ; X/i;2 -(-2 -|- 1 -f yktk-l)- 

IV. Routing 

CCRDNs feature a class of routing algorithms that emerges 
naturally from their definition, called dimensional routing. 

A. Dimensional routing 

Definition IV.l. Let X = {X{h) : h = 0,1,...} be a family of 
CCRDNs, and let Xh be a copy of X(h), for some fixed h > 0. 
Let X^_^ and X^_^ be disjoint copies of X{h— 1) in X^, 
and let src and dst be nodes of X^_^ and respectively. 

Since Xh is completely connected, there is a level-h link in Xh 
incident with a node dst' in X^_^ and a node src' in X^_^. 
If h — 1 = 0 then either src = dst' or (src, dst') is a link, 
and otherwise a path from src to dst' can be recursively 
computed in Xj(_^. This same method provides a path Phfrom 
src' to dst in A dimensional routing algorithm on X is 

one which computes paths of the form Pa + (dst', src') + P^, 
between any source-destination pair of nodes in a member of 
X, and is denoted DRx. A dimensional route is one that can 
be computed by a dimensional routing algorithm. 

Remarkably (and, perhaps, unfortunately), there are topolo¬ 
gies and source-destination pairs for which no dimensional 
routing algorithm computes a shortest path; a notable example 
is the family of WK-recursive networks ( ifTSl ). for which a 
shortest path algorithm is developed in IIThl . 

1) Dimensional routing in (Generalized) DCell and 
FiConn: (Generalized) DCell and FiConn are CCRDNs in 
which each pair of disjoint copies of DCellfc_i „ within 
DCellfc „ is joined by exactly one edge. As such, there is only 


one choice for the edge (dst', srd), which is computed by the 
connection rule for level-6 links. Therefore, the connection 
rules in Sections IIII-AllIlLBl suffice to describe dimensional 
routing for these DCNs. 

The dimensional routing algorithms for each of these net¬ 
works serves as a basis for fault-tolerant and load-balancing 
routing algorithms DFR in 0, and TAR in in, and it is 
precisely the algorithm called Generalized DCellRouting, 
given in ina. The former two are fault and congestion-tolerant 
routing algorithms that compute significantly longer paths, on 
average, than the dimensional routing algorithms. 

B. Proxy Routing 

A general routing algorithm on a family X = {X(h) : 
6 = 0, 1,...} of CCRDNs is of the following form. Let Xh 
be a copy of X(h), for some fixed h > 0. Let X^°_^ and 
X^Zi be disjoint copies of X(h — 1) in Xh, with srcc^ 
and dstct_i nodes of X^°_^ and X^Zi, respectively. Let 
Xh°_i, X(f_^,..., -V//)/ be a sequence of copies of X(h— 1), 
where: cq = a; ct-i = L, Ci A Ci+i, for 0 ^ i < P, and 
is disjoint from whenever Ci A Cj. Let (dst^, srca^^) 

be a link from X^_^ to and let Pi be paths in each 

X'Z_i from srCci to dst a. 

Every routing algorithm computes a path (we shall as¬ 
sume that there are no repeated nodes) of the form Fq + 
(dstaa,srccfj -f Fi -f ... -f (dstat_^.^,srCct^fj + Pt-i. 

A general route of order T is one in which t for each 
X(h), with 6 = 0,1,... and t = T for at least one of these. 
A proxy route, computed by a proxy routing algorithm, is a 
general route of order 3 (and a dimensional route is of order 
2). 

1) DFR for DCell and TAR for FiConn: While we do 
not provide full details here, we sketch the proxy-routing-like 
subroutine that is common to DFR (0) and TAR (0). Both 
DFR and TAR are adaptive routing algorithms which compute 
paths in a distributed manner, making decisions on the fly, 
based on information that is local to the current location of 
the packet being routed. 

This subroutine computes a part of a proxy route to replace 
a sub-path of the intended route. In particular, a packet may 
bypass a level m link, e, from sub-structure D(ja_i to 
by re-routing through a proxy, with a, 6, and c distinct. 

The decision to bypass is made when the packet arrives at e 
(or near e, as determined by a parameter in DFR), and upon 
its arrival in D^_^, the packet is routed directly to its final 
destination. 

The algorithms DFR and TAR produce much longer than DR, 
on average. The simulations in 0 show that DFR, although 
fault-tolerant, computes paths that are over 10% longer than 
the shortest paths, on average, even with as little as 2% 
failures. The maximum length of a route computed by the 
implementation of TAR in 0 (Theorem 7) is 2 • 3^= — 1, whilst 
it is 2 • 2^= — 1 for DR (called TOR in 0). This is reflected 
in their simulations of random and burst traffic, where TAR 
computes paths that are 15-30% longer, on average, than those 
computed by DR. 




V. Proxy routing in DCell and FiConn 

We propose that proxy routing be used more broadly 
than it is in DFR and TAR, and with the primary goal of 
efficiently computing short paths, rather than fault-tolerance 
and balancing load, by applying it in a fundamentally different 
manner; firstly, we seek to compute a proxy route at the outset, 
rather than building the route piecemeal; secondly, we use this 
pre-planning in order to find a proxy route that offers a high 
degree of savings over the dimensional route. 

One reason for focusing on f ^ 3 is that visiting each 
0 < j < f — 1, has an associated cost, and 
when m is small, as it is when our graphs represent DCNs 
with a realistically deployable number of servers, it becomes 
less likely that general routes with f > 3 will be useful. 
Furthermore, the methods of searching for a “good” proxy that 
we explore here may become impractical for f > 3, because 
the search space of potential (multiple) proxies is much larger. 

Henceforth we use (/-Cell in place of (Generalized) DCell 
and FiConn whenever we make statements or arguments that 
apply to all of these. 

The following lower bound on the hop-length of a general 
route of order t is obvious. 

Lemma V.l. Let src and dst be server-nodes in a Q-Cellk,n, 
with k > 0, such that src is in and dst is in with 

a b. A general route of order t has length at least 2t — 3. 
In particular, a dimensional route has length at least 1 and a 
proxy route has length at least 3. 

The remainder of our paper is a comparative empirical 
analysis of several versions of PR, given in Algorithm [T] 


Algorithm 1 PR for (/-Cell returns a proxy route if it finds 
one that is shorter than the corresponding dimensional route. 
Require: src and dst are server-nodes in a ^-Cellfe^n- 
function PR(src, dst, m) 

if m > 0 and both src and dst are in the same 
copy of 0-Cellm-i,n then 
return PR(src, dst, m — 1) 

end if 

D'L_, «— GP(src, dst, m). 
if = null then 

return DR(src, dst). 

else 

<— the (/-Cellm-i,n containing src. 

<— the (/-Cellm-i,n containing dst. 

(a‘^,c“) the link from Df_i to 
{c^,b‘^) <— the link from D!^_i to D^_i. 

return 

PR(src, a°, m — 1) -P (a'^, c“)-l- 

PR(c“, c**, TO — 1) -P (c^, 6^)-l- (1) 

PR(6'^, dst, TO — 1). 

end if 

end function 


A. GP: GetProxy 

GP is the subroutine of PR that computes the proxy used 
in Expression ([T]|, if a proxy is to be used. That is, GP 
returns either a proxy sub-(7-Cell, or it returns 

null. Obviously, the performance of PR (and its success in 
producing a shorter route than DR) depends on the proxy 
returned by GP and how GP is implemented. 

Ideally GP would instantly compute a unique proxy sub-(/- 
Cell if it exists, such that the proxy route through 

'^he shortest one possible. Such an algorithm is 
unknown to us. 

Our strategy, however, is widely applicable, as regards 
different connection rules and path diversity. Every version of 
GP that we explore is of the following form. Let (src, dst, to) 
be the inputs to GP. If to = 0, GP outputs null', otherwise, 
let TO > 0, so that src is in and dst is in 

for some a not equal to b. GP computes a set of candidate 
proxies, D^_^,..., } (taken from the set of all 

potential proxy (/-Cellm_i_„s), and then finds a c/ for which 
the path in Expression ([T]i is shortest (replacing c by c/), by 
constructing the paths explicitly. If the set of candidate proxies 
is empty, then GP returns null. 

The key observation is that we must minimise the number of 
candidate in order to reduce the search space. Our goal 

is to identify and evaluate general techniques towards this end, 
and not to catalogue all of the ways to tune GP. Some more 
complicated techniques are avoided because there is no room 
to discuss them in this paper; for example when routing in a (/- 
Cellfe^ra we only apply PR at the top level, whereas slightly 
shorter paths can be obtained, on average, by using proxy 
routes in the recursive calls to PR at Expression ([T]i- Other 
techniques are avoided because they are evidently unprofitable; 
for example, a much larger search is encountered if GP 
computes proxy paths for each proxy candidate. We describe 
three strategies for generating the candidate proxies below. 

1) GP_E as an exhaustive search: A proxy 

DCellm-i,n ^m-i Can be obtained, naively, if GP is 
implemented as an exhaustive search; that is, we perform 
the steps described in Section IV-AI for every c in 
{0,1,..., (>}. Measuring the length of each 

proxy route has an associated cost, but GP_E provides the 
optimal proxy route with top-level proxies only against which 
to test the two strategies given below. 

2) GP_I as an intelligent search: We propose a general 
method for reducing the proxy search space, based on the 
labels of src and dst. In particular, we look at proxies D'^_^ 
whose relationship to D'^_^ and is such that at least one 
of the routes computed by the recursive calls to PR is confined 
to a (7-Cellfc_2,n (see Eig. [T])- 

We first give some notation. Henceforth, let Dk be an 
instance of Q-Cellk,n, and let DR be the dimensional routing 
algorithm on (/-Cell. For clarity of exposition we describe 
a method for selecting a proxy D 2 when routing in a Q- 
Cellfe^n, with fc = 3, but the notation extends to all fc > 1. 

Let src and dst be nodes in a (/-Cella^n, with src = 
03020100 and dst = b^b 2 bibQ, so that uid^{src) = t 2 a^ -P 








Fig. 1. Strategy for GP_I, where h = k — 2, and for GP_0 where h = 0: 
select c such that at least one sub-path is contained in a Q-Cellh n- Solid 
arcs represent links, and dashed or dotted cuiwes represent paths. 

tia 2 + tf^ai + oo and uidz{dst) = ^ 2^3 + + io&i + ^o- 

Let 03 ^ 63 , and note that without loss of generality, we may 
assume 03 < 63 . 

Our convention for denoting the link between two sub-^- 
Cells is as follows: let Df and Ijf be ^-Cell 2 ,rtS and recall 
that we may write [a, uid 2 {v)] for a node v = Q!U 2 Wiwo in D 2 ’ 
where uid 2 {v) = tiV 2 + fo^’i + vq- Let ([a,a^], [/3,/3“]) be 
the link from Z ?2 to D 2 , with and similarly 

for /3“ = /3f 

GP_I builds its set of proxy candidates on the condition 
that the source and destination are not near to each other. Let 
a = 03 and let b = 63 . GP_I outputs null if [ 03 , 0 ^] is a 
server-node of or [ 63 , 6 “] is a server-node of That 
is, when 02 = 02 and 62 = & 2 - 

Provided the above condition is avoided, we then select 
a proxy to be a candidate, when c is such that one of 
the three sub-paths, PR(src, [a, a°]) or PR([c, c“], [c, c^]) or 
PR([ 6 , , dst), is short; specifically, if at least one of the three 

sub-paths is contained inside a single 0-Celli_„. That is, c 
satisfies at least one of the following three properties (in a 
non-trivial way; see discussion below): 

src and [a, a'^] are in the same Di ■. a 2 = ( 2 ) 

[c, c“] and [c, c^] are in the same Di : C 2 = C 2 (3) 

[ 6 , and dst are in the same Di : 62 = 62 , (4) 

where ™tl similarly for C 2 and b^. Clearly for any 

(7-Cell we can verify whether a proxy candidate Z ?2 satisfies 
one (or more) of the Properties (|2]l-(|4]i, since the numerators 
are computed directly from the various connection rules of 
each (/-Cell. However, we wish to compute the set of values 
c which satisfy Properties (|2]i-(|4|i in constant time. 

The floor function yields that [“7‘iJ = ^2 if. and only if, 
a 2 ti ^ < (02 + l)fi. It happens that for our connection 

rules (see Sections HIIll . is piecewise linear (as a function of 
c), and similarly for 6 '^, c“, and c^, with exactly three cases: 
namely, C 3 < 03 < b^; < C 3 < 63 ; and, < b^ < 

C 3 (where the case 63 < 03 is treated by swapping src and 


dst). As a result of this, the set of values c which satisfy 
Properties (|2|i-(|4|i can be computed very efficiently for our 
connection rules as the union of, at most, a constant number 
of intervals (see Table |I|i. Note that for the connection rules 
explored in this paper Property Q is redundant because it 
does not narrow the search space; for certain pairs (a, b), all 
c satisfy Property Q, while no c satisfies it for other pairs. 

For the case k = 3 and the connection rules for DCell, [3- 
DCell, and FiConn, GP_I considers a small set with around 
ti or 2fi candidate proxies. More generally, a close inspection 
of Properties ^ and (|4]l reveals that they each yield exactly 
t 2 (possibly disjoint) candidate proxies for Generalized DCell 
and at most ti candidate proxies for FiConn. Due to space 
constraints we omit a full discussion of this, but we remark that 
a better understanding of this aspect of proxy routes may shed 
light on the sophisticated relationship between the connection 
rule and various distance metrics on (7-Cell. 

3) GP_0 level-0 proxy search: We note that for a Q- 
Cellk.n, with k = 2, the proxy candidates computed by 
GP_I are simply those for which a° is in the same copy of Q- 
Cello.ra as src or 6'^ is in the same copy of (7-Cello,n as dst 
or c“ and are in the same copy of (7-Cello,GP_0 mimics 
GP_I, but computes the set of proxies that satisfy at least one 
of the aforementioned properties, in place of Properties dU- 
(|4|i. It is applied only to (7-Cellfc,„ with k > 2. 

4) Implementation notes: The savings in hop-length and the 
benefit to load-balancing come at the cost of searching proxy 
candidates, whose number is given by p in Fig. |3 For each 
proxy candidate c, the lengths of sub-paths PR(src, [a, a'^]) or 
PR([c, c“], [c, c 7) or PR([6, 5°], dst) must be computed; hence 
the reason for devising GP_I and GP_0 with the object of 
minimising c. Once GP* is “tuned” to suit a certain application 
and network size, however, there are several choices for how 
it can be implemented. How exactly this is done depends on 
the size of the network and the nature of the application, but 
we shall remind ourselves of some of the available tools. 

The most naive method is to compute the route at the 
source-node, by computing the candidate paths explicitly, and 
measuring their length, however, other methods such as table 
look-ups must to be considered. 

GP_I, in particular, leverages the fact that (7-Cellfc,nS 
grow double-exponentially in k in order to And proxy candi¬ 
dates that are linked to the same copy of (7-Cellfc- 2 ,n 
as src or dst. This has a secondary benefit; namely, Q- 
Cellfe_ 2 ,n (and even Q-Cellk-i,n) is small, relative to Q- 
Cellfe,n, and this makes table look-ups feasible for storing 
the lengths of paths within each copy of (7-Cellfc-2,n. and 
possibly within each copy of (7-Cellfc-i,n- The whole table 
must be replicated at each server-node to be used this way, 
but this is still much smaller than storing every (src, dsfj-pair. 
For example, there are 24,492^ = 599,858, 064 such pairs in 
DCell 3 , 3 , and g^t^ = 157 * 156^ = 3, 820, 752 pairs confined 
to sub-DCell 2 , 3 S, and 5352^1 = 157 * 13* 12^ = 293, 904 pairs 
confined to sub-DCelli, 3 S (see Table HHi. 

In addition to table look-ups, we also leverage the fact 
that paths are computed for flows, rather than packets, and in 




































route \ c 

C3 < 03 < 63 

03 < C3 < bs 

ffl3 < ^3 < C 3 

a 3 a 2 a-ia-o to [a,a^\ 

[c3/tij=a2 

[c3-l/tij=a2 

[c3-l/tij=a2 

[c, c“] to [c, c*”] 

[a3-VtlJ = ['>3-VtlJ 

[<I3/tiJ = ['J3-VtlJ 

[“3/tiJ = [63/tiJ 

[6, b'^] to 6362^160 

E3/tiJ=b2 

[':3/tiJ=b2 

[c3-l/tij=b2 


TABLE I 

Properties ^2)-@ applied to DCell3,„ . 


certain applications may be re-used for multiple flows among a 
set of server-nodes that is small, relative to the entire network. 
In addition, each time we compute a proxy path, we may 
identify multiple viable proxies (the context of the application 
and network size defines what this means), and hence, path 
diversity comes at no extra cost. We may choose from several 
paths at random, send a probe packet to explore the loads and 
possible faults on each path before sending a larger flow, or 
remember proxies for common and recent destinations. 

VI. Experiments 

A. Experimental setup 

We compare up to five different routing algorithms for 
various 0-Cells. They are: DR; shortest paths, computed by 
a breadth first search (BFS); PR with GP_E; PR with GP_I; 
and, PR with GP_0. Each routing algorithm (for a given 
DCN) is tested with the same 10,000 input pairs, {src,dst). 
The estimated standard error of the mean is computed by 
Sxivtrials, where Sx is the sample standard deviation and 
trials = 10, 000. Eor our purposes of surveying the effects 
of different instances of GP, this value is negligible, and we 
therefore omit error bars in Eigs. |2]-[3] 

Eor each algorithm we plot 100(xdr — x)/xdr in Eig. |2] 
where x is the mean hop-length in the sample of computed 
routes. In other words, we plot the percent savings in hop- 
length over DR. Note that GP_0 is implicitly plotted for k = 2 
because it is equivalent to GP_I in this case. 

We also plot, in Eig. [2 the mean number of proxies consid¬ 
ered by GP_I and GP_0 denoted p_i and p_o, respectively, 
and the mean number of routes PR{src,dst) found to be no 
longer than DR(src, dsf), denoted r_i and f_o, respectively. 
Note that p_l = p_0 for k = 2 and, as such, this value is 
implicitly plotted for A: = 2 in Eig. [3 

The two histograms in Eig. |4] show the proportion of links 
with a given load (number of flows) in /S-DCella 3 , under 1 
million one-to-one communications, generated uniformly at 
random; one histogram is for DR and the other one is for 
PR with GP_I. 

The networks we tested are given with their basic properties 
in Table HU and the details of each version of GP* are given 
in Section IV^ 

B. Evaluation 

The plots in Eig. |2] show that for many 0-Cell topolo¬ 
gies, significant savings in hop-length can be made over 
dimensional routes by using proxy routes, depending on the 
connection rule, network size, and the parameters k and n. It is 
immediate that GP_I and GP_0 retain some good proxies, in 


relation to GP_E, which tries all of them. Eurthermore, GP_E 
is comparable to BFS. Eig.[3tells us how much searching each 
of the methods GP_I and GP_0 must do, and how much path 
diversity they create, on average. 

Note that the means plotted in Eigs. |2]-[3] hide the success 
rate of PR in finding a good proxy path; as a typical example, 
PR(src, dsf) is shorter than DR(src, dsf) for approximately 
30% of input parrs when using GP_I in DCella ^. 

We highlight (and explain, where possible) some of the 
trends observable in the plot of Eig. E In general, proxy 
routes are more effective in /3-DCellfe than in DCellfe ,,, and 
EiConnfc * of comparable size, with fixed k, however, even 
EiConnfc_,i, still sees up to a 6-7% improvement. 

The apparent weakness of PR in EiConn is partly explained 
by the fact that for given k and n, there are fewer proxy 
EiConnm_i „s to consider at level m. On the other hand 
we And that GP_0 considers fewer than 51 = 6 proxies for 
EiConna 10 , while it considers more than gi = 7 proxies for 
DCella e and /S-DCella g. In addition, there are an equal num¬ 
ber of potential proxy candidates in /3-DCellfe „ and DCellfe „ 
in general, yet GP_E, GP_I, and GP_0 invariably consider 
more proxy candidates for DCellfc_„, only to produce proxy 
paths that perform better in /3-DCellfc We must conclude 
that the connection rule and topology (EiConn vs Generalised 
DCell) profoundly impacts the performance of our proxy 
routing algorithms. This is somewhat unsurprising, however, 
since the connection rule and topology also affect the shortest 
paths; for example, the mean distance in / 3 -DCell 3_3 is far 
shorter than in DCellg 3 (see also 112 ). 

Proxy paths in larger networks (when increasing n) are 
worse than those in smaller networks, for each DCN with fixed 
k\ for example DCellg 3 and DCell 3 _ 6 , and also EiConng 10 and 
EiConns^e- 

A related trend appears to be that for each family of 
DCNs, proxy-path-savings increase with k, in every version 
of GP*; for example, EiConng 10 and EiConn 4 g. The main 
reason for this is that the performance of BFS, relative to 
DR, also increases with k, thus providing a greater margin for 
improvement by using PR. 

The difference between GP_I and GP_0 grows with k (note 
that for k = 2, they are the same, and hence GP_0 is not 
plotted for k = 2). This is because GP_I looks for sub¬ 
paths within a copy of 0-Cellfc_ 2 ,n, whereas GP_0 looks 
for sub-paths within a copy of 0-Cello,n, and as the gap 
between 0 and k — 2 increases, GP_I considers a larger set 
of proxy candidates. Similarly, we explain how the difference 
between GP_E and GP_I grows with k, but here it is the 
double exponential growth of 0-Cell that contributes extra 







DCN 

N 

N/n 

\E\ 

d 

91 

92 

93 

F2,36 

117648 

3268 

161766 

7 

19 

172 


F2,48 

361200 

7525 

496650 

7 

25 

301 


Fs.io 

116160 

11616 

166980 

15 

6 

16 

121 

Fs.ie 

3553776 

222111 

5108553 

15 

9 

37 

667 

Ft,6 

857472 

142912 

1259412 

31 

4 

7 

22 

Ft,8 

37970240 

4746280 

55768790 

31 

5 

11 

56 

D2,18 

117306 

6517 

234612 

7 

19 

343 


D2,43 

3581556 

83292 

7163112 

7 

44 

1893 


D3,3 

24492 

8164 

61230 

15 

4 

13 

157 

D3,6 

3263442 

543907 

8158605 

15 

7 

43 

1807 


TABLE II 

Properties oe the DCNs in our experiments. We use F to abbreviate FiConn, and D to abbreviate (/3-)DCell. 


proxy candidates to GP_E, since the search space for GP_I 
is proportional to gk-i, whereas, GP_E considers exactly gj. 
proxy candidates (see Table ID- Most notably, however, is the 
fact that for 0 -Cell2,*, the performance of GP_E is almost 
identical to the performance of GP_I; whereas DCell 2,43 has 
gi = 44, and g 2 = 1893, our results show that optimal proxies 
are nevertheless considered by GP_I (and hence, GP_0). 

Although GP* is effective in computing shorter paths and 
comes fairly close to BFS (typically over 80% of the savings 
are obtained with PR), we can confirm that the shortest paths 
for these topologies are not, in general, a proxy route of the 
form we are considering in this paper as sometimes (e.g. (/3- 
IDCella a) this difference is considerable. This was expected, 
and provides motivation to explore novel general routing 
algorithms of order 3 and higher in future work. 
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Fig. 2. Percent mean hop-length savings over DR. 

Another benefit of proxy routing is that it also yields some 
path diversity which can be exploited for load balancing and 
fault-tolerance purposes. This can be seen in Fig. [3 where 
f is the number of distinct (but not necessarily disjoint) 
paths considered by PR{src,dst) that are no longer than 
DR{src,dst). Additional data must be studied, however, to 
determine exactly how f affects the load-balancing properties 
of the network. 

We computed histograms that show the proportion of links 
with a given load, under 1 million one-to-one communications, 
plotted in Fig. |4] The histogram for GP_I is shifted left 
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Fig. 3. Mean number of candidate proxies p, and mean number of routes 
no longer than DR{src,dst), f. 


relative to the histogram for DR, meaning that many links carry 
less load than in the same scenario for DR. In addition, the 
maximum load is reduced (in our sample), suggesting many Q- 
Cells have a higher aggregate bottleneck throughput (ABT, 
introduced in CQl, and closely related to the most heavily 
loaded link in the network) with PR than with DR. 

Note that our primary focus is to reduce hop-length and 
implementation overheads of GP, and that we could increase 
path diversity even more if we were willing to route on longer 
paths than DR(src, dsf); we do not do this here, but will 
explore this possibility in future research. 

C. Significance 

Various aspects of routing in a DCN depend heavily on the 
availability of short one-to-one paths. For example, minimising 
latency and energy usage, and building fault-tolerant and load 
balancing routing algorithms. 

While there are inherent trade-offs in computing short 
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Fig. 4. Normalised histograms showing the proportion of links with a given 
load (number of flows), comparing DR with PR using GP_I ^-DCella^s. 

proxy routes, there are also multiple benefits: using shorter 
one-to-one paths in a DCN reduces the average latency of 
communications, the aggregate load, and thereby the energy 
usage; and, we obtain a non-deterministic path diversity at no 
extra cost while computing these paths, which can be used both 
adaptively or randomly to deal with faults and congestion, in 
addition to forming the building blocks of other fault-tolerant 
and load balancing routing algorithms (such as the way DR is 
used in DFR and TAR). As such, proxy routes are not only 
a good candidate for replacing DR in (Generalized) DCell 
and FiConn, they are also effective at performing some of 
the functions of the known adaptive routing algorithms for 
these networks, namely DFR and TAR, while simultaneously 
producing short paths. 

VII. Conclusions and Future Research 

In this paper we have shown that the topologies of the DCNs 
Generalized DCell and FiConn are completely connected 
recursively-defined networks. As such, we characterised all 
possible routes (with no repeated nodes) on these networks 
and then proposed the family of routing algorithms PR to 
compute proxy routes; that is, general routes of order 3. We 
detailed three instances of this family, GP_E, GP_I, and 
GP_0, where each one considers a number of candidate proxy 
sub-structures, and selects the optimal proxy to route through. 
We performed an analytical and empirical comparison between 
these, shortest paths, and the previously known dimensional 
routes, as regards mean hop-length; The main results of our 
experiments are that significant savings in hop-length can be 
made over dimensional routes by using proxy routes, even 
with only a relatively small set of candidate proxies, and that 
the amount of savings depends on connection rule, network 
size, and the parameters k and n. 

In future research we will perform a deeper analysis of 
the DCNs in question, with two major goals. The first one, 
motivated by the fact that GP_I sometimes discards the 
optimal proxy candidate, calls for a closer inspection of the 
topologies. We want to both hnd the optimal proxy candidates, 
and reduce the size of the search space. 


Furthermore, whereas this paper is focused on dimensional 
and proxy routing, there may be cases where no shortest path 
between two server-nodes is a dimensional route or a proxy 
route. Note that whilst a given shortest path may be found not 
to be a dimensional or proxy route, this does not preclude other 
paths with the same terminal nodes from being dimensional 
or proxy routes. A deeper mathematical analysis of the DCNs 
in question may shed light on (1) whether or not higher-order 
routing algorithms are needed, and (2) how to compute optimal 
routes of this type efficiently. 
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